A Novel Statistical Feature Selection Approach for Text Categorization
نویسندگان
چکیده
منابع مشابه
A novel feature selection algorithm for text categorization
With the development of the web, large numbers of documents are available on the Internet. Digital libraries, news sources and inner data of companies surge more and more. Automatic text categorization becomes more and more important for dealing with massive data. However the major problem of text categorization is the high dimensionality of the feature space. At present there are many methods ...
متن کاملStatistical Feature Selection Techniques for Arabic Text Categorization
This paper compares a few statistical feature selection techniques for Arabic text. Feature selection is especially important for text classification because, when dealing with text, the number of features/words increases rapidly. This makes the document-term matrix a sparse one which affects the performance of classifiers in terms of accuracy and in terms of processing time. One opts to reduce...
متن کاملA Novel One Sided Feature Selection Method for Imbalanced Text Classification
The imbalance data can be seen in various areas such as text classification, credit card fraud detection, risk management, web page classification, image classification, medical diagnosis/monitoring, and biological data analysis. The classification algorithms have more tendencies to the large class and might even deal with the minority class data as the outlier data. The text data is one of t...
متن کاملMMR-based Feature Selection for Text Categorization
We introduce a new method of feature selection for text categorization. Our MMR-based feature selection method strives to reduce redundancy between features while maintaining information gain in selecting appropriate features for text categorization. Empirical results show that MMR-based feature selection is more effective than Koller & Sahami’s method, which is one of greedy feature selection ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Journal of Information Processing Systems
سال: 2017
ISSN: 2092-805X
DOI: 10.3745/jips.02.0076